PSI-BLAST pseudocounts and the minimum description length principle

نویسندگان

  • Stephen F. Altschul
  • E. Michael Gertz
  • Richa Agarwala
  • Alejandro A. Schäffer
  • Yi-Kuo Yu
چکیده

Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complexity Approximation Principle

We propose a new inductive principle, which we call the complexity approximation principle (CAP). This principle is a natural generalization of Rissanen’s minimum description length (MDL) principle and Wallace’s minimum message length (MML) principle and is based on the notion of predictive complexity, a recent generalization of Kolmogorov complexity. Like the MDL principle, CAP can be regarded...

متن کامل

A tutorial introduction to the minimum description length principle

This tutorial provides an overview of and introduction to Rissanen’s Minimum Description Length (MDL) Principle. The first chapter provides a conceptual, entirely non-technical introduction to the subject. It serves as a basis for the technical introduction given in the second chapter, in which all the ideas of the first chapter are made mathematically precise. This tutorial will appear as the ...

متن کامل

A New Minimum Description Length

The minimum description length(MDL) method is one of the pioneer methods of parametric order estimation with a wide range of applications. We investigate the definition of two-stage MDL for parametric linear model sets and exhibit some drawbacks of the theory behind the existing MDL. We introduce a new description length which is inspired by the Kolmogorov complexity principle.

متن کامل

Minimum Description Length (MDL) Principle as a Possible Approach to Arc Detection

Detecting arcing faults is an important but difficult-to-solve practical problem. In this paper, we show how the Minimum Description Length (MDL) Principle can help in solving this problem. Mathematics Subject Classification: 68Q30, 93AXX

متن کامل

Layered Representation of Motion Video using Robust Maximum - LikelihoodEstimation of Mixture Models and MDL

Representing and modeling the motion and spatial support of multiple objects and surfaces from motion video sequences is an important intermediate step towards dynamic image understanding. One such representation, called layered representation, has recently been proposed. Although a number of algorithms have been developed for computing these representations, there has not been a consolidated e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2009